{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8WD-8WRj6LMd"
      },
      "source": [
        "# Evaluation and observability for LLM applications\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p9M0ALVU6LMf"
      },
      "source": [
        "## Creating an account on Comet.com\n",
        "\n",
        "[Comet](https://www.comet.com/site?from=llm&utm_source=opik&utm_medium=colab&utm_content=llamaindex&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm&=opik&utm_medium=colab&utm_content=llamaindex&utm_campaign=opik) and grab you API Key.\n",
        "\n",
        "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm&utm_source=opik&utm_medium=colab&utm_content=llamaindex&utm_campaign=opik) for more information."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "v9SlRqZ-6LMf"
      },
      "outputs": [],
      "source": [
        "# !pip install opik llama-index llama-index-agent-openai llama-index-llms-openai --upgrade --quiet"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "JkxWiGyq6LMf"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "OPIK: Opik is already configured. You can check the settings by viewing the config file at /Users/akshaypachaar/.opik.config\n"
          ]
        }
      ],
      "source": [
        "import opik\n",
        "\n",
        "opik.configure(use_local=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EeKqvSRV6LMg"
      },
      "source": [
        "## Preparing our environment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Create a .env file with the following content:\n",
        "> OPENAI_API_KEY=your-api-key-here\n",
        "\n",
        "> COMET_API_KEY=your-comet-api-key-here\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from dotenv import load_dotenv\n",
        "\n",
        "load_dotenv()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VkfFVBHk6LMg"
      },
      "source": [
        "## Download some sample documents"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "id": "p2-oOAst6LMg"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import requests\n",
        "\n",
        "# Create directory if it doesn't exist\n",
        "os.makedirs(\"./data/paul_graham/\", exist_ok=True)\n",
        "\n",
        "# Download the file using requests\n",
        "url = \"https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt\"\n",
        "response = requests.get(url)\n",
        "with open(\"./data/paul_graham/paul_graham_essay.txt\", \"wb\") as f:\n",
        "    f.write(response.content)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Simple demo of logging with Opik"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "OPIK: Started logging traces to the \"Default Project\" project at https://www.comet.com/opik/akshayp/redirect/projects?name=Default%20Project.\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "2"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from opik import track\n",
        "\n",
        "@track\n",
        "def my_function(x: int) -> int:\n",
        "    return x + 1\n",
        "\n",
        "my_function(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Tracking LLM calls with Opik"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Hello! How can I assist you today?\n"
          ]
        }
      ],
      "source": [
        "from opik.integrations.openai import track_openai\n",
        "from openai import OpenAI\n",
        "\n",
        "openai_client = OpenAI()\n",
        "openai_client = track_openai(openai_client)\n",
        "\n",
        "prompt=\"Hello, world!\"\n",
        "\n",
        "response = openai_client.chat.completions.create(\n",
        "    model=\"gpt-3.5-turbo\",\n",
        "    messages=[\n",
        "      {\"role\":\"user\", \"content\":prompt}\n",
        "    ]\n",
        ")\n",
        "\n",
        "print(response.choices[0].message.content)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "izXWKbWn6LMg"
      },
      "source": [
        "## Using LlamaIndex\n",
        "\n",
        "### Configuring the Opik <> LlamaIndex integration"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RiVT-X8f6LMg"
      },
      "source": [
        "You can use the Opik callback directly by calling:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "_Kk2adoM6LMh"
      },
      "outputs": [],
      "source": [
        "from llama_index.core import Settings\n",
        "from llama_index.core.callbacks import CallbackManager\n",
        "from opik.integrations.llama_index import LlamaIndexCallbackHandler\n",
        "\n",
        "# Set up a callback handler that will automatically log all LlamaIndex operations to Opik\n",
        "opik_callback_handler = LlamaIndexCallbackHandler()\n",
        "\n",
        "# Integrating this handler into LlamaIndex's settings\n",
        "Settings.callback_manager = CallbackManager([opik_callback_handler])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "REvv1oXl6LMh"
      },
      "source": [
        "Now that the callback handler is configured, all traces will automatically be logged to Opik."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup a simple LLamaIndex RAG pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "ocSMhPDh6LMh"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "OPIK: Started logging traces to the \"Default Project\" project at https://www.comet.com/opik/akshayp/redirect/projects?name=Default%20Project.\n"
          ]
        }
      ],
      "source": [
        "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
        "\n",
        "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n",
        "index = VectorStoreIndex.from_documents(documents)\n",
        "query_engine = index.as_query_engine()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ARzzPtfI6LMh"
      },
      "source": [
        "We can now query the index using the `query_engine` object:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "Oi_Ruw586LMh"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The author worked on writing short stories and programming, starting with early attempts on an IBM 1401 in 9th grade, using an early version of Fortran. Later, the author transitioned to working with microcomputers, building a TRS-80 and writing programs on it, including simple games and a word processor.\n"
          ]
        }
      ],
      "source": [
        "response = query_engine.query(\"What did the author do growing up?\") \n",
        "print(response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VE_rhcnZ6LMh"
      },
      "source": [
        "You can now go to the Opik app to see the trace:\n",
        "\n",
        "![LlamaIndex trace in Opik](https://raw.githubusercontent.com/comet-ml/opik/main/apps/opik-documentation/documentation/static/img/cookbook/llamaIndex_cookbook.png)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "'The author worked on writing short stories and programming, starting with early attempts on an IBM 1401 using Fortran in 9th grade. Later, the author transitioned to working with microcomputers, building simple games and a word processor on a TRS-80. Additionally, the author initially planned to study philosophy in college but switched to AI due to a lack of interest in philosophy courses.'"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "str(response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prepare data for evaluation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Load dataset and insert into Opik"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Question</th>\n",
              "      <th>Answer</th>\n",
              "      <th>Context</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>What was the very first programming language Paul Graham used when he began learning to program on the IBM 1401?</td>\n",
              "      <td>He used an early version of Fortran on the IBM 1401.</td>\n",
              "      <td>The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Which microcomputer did Paul Graham’s father finally agree to buy for him around 1980?</td>\n",
              "      <td>A TRS-80.</td>\n",
              "      <td>Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>What was the name of the startup Paul Graham co-founded that built software to create online stores?</td>\n",
              "      <td>Viaweb.</td>\n",
              "      <td>We started a new company we called Viaweb, after the fact that our software worked via the web, and we got $10,000 in seed funding from Idelle's husband Julian.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Which friend of Paul Graham was the person responsible for the 1988 Internet Worm?</td>\n",
              "      <td>Robert Tappan Morris (often referred to as “Robert Morris” or “Rtm” in the text).</td>\n",
              "      <td>I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he'd found such a spectacular way to get out of grad school.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>What was the title of the second Lisp book that Paul Graham wrote after finishing *On Lisp*?</td>\n",
              "      <td>*ANSI Common Lisp.*</td>\n",
              "      <td>So with my unerring nose for financial opportunity, I decided to write another book on Lisp. This would be a popular book, the sort of book that could be used as a textbook. I imagined myself living frugally off the royalties and spending all my time painting. (The painting on the cover of this book, ANSI Common Lisp, is one that I painted around this time.)</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                                                                                                           Question  \\\n",
              "0  What was the very first programming language Paul Graham used when he began learning to program on the IBM 1401?   \n",
              "1                            Which microcomputer did Paul Graham’s father finally agree to buy for him around 1980?   \n",
              "2              What was the name of the startup Paul Graham co-founded that built software to create online stores?   \n",
              "3                                Which friend of Paul Graham was the person responsible for the 1988 Internet Worm?   \n",
              "4                      What was the title of the second Lisp book that Paul Graham wrote after finishing *On Lisp*?   \n",
              "\n",
              "                                                                              Answer  \\\n",
              "0                               He used an early version of Fortran on the IBM 1401.   \n",
              "1                                                                          A TRS-80.   \n",
              "2                                                                            Viaweb.   \n",
              "3  Robert Tappan Morris (often referred to as “Robert Morris” or “Rtm” in the text).   \n",
              "4                                                                *ANSI Common Lisp.*   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                                                                    Context  \n",
              "0                                                                                                                                                                          The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it.  \n",
              "1                                                                                                                                                           Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough.  \n",
              "2                                                                                                                                                                                                          We started a new company we called Viaweb, after the fact that our software worked via the web, and we got $10,000 in seed funding from Idelle's husband Julian.  \n",
              "3                                                                                                                                                                                  I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he'd found such a spectacular way to get out of grad school.  \n",
              "4  So with my unerring nose for financial opportunity, I decided to write another book on Lisp. This would be a popular book, the sort of book that could be used as a textbook. I imagined myself living frugally off the royalties and spending all my time painting. (The painting on the cover of this book, ANSI Common Lisp, is one that I painted around this time.)  "
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import pandas as pd\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "\n",
        "df = pd.read_csv(\"data/test.csv\")\n",
        "df.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create a dataset client\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {},
      "outputs": [],
      "source": [
        "from opik import Opik\n",
        "\n",
        "client = Opik()\n",
        "dataset = client.get_or_create_dataset(name=\"Test dataset\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Insert"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'input': 'What was the very first programming language Paul Graham used when he began learning to program on the IBM 1401?',\n",
              " 'expected_output': 'He used an early version of Fortran on the IBM 1401.',\n",
              " 'context': 'The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it.'}"
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "qa_pairs = [\n",
        "    {\"input\": row[\"Question\"], \"expected_output\": row[\"Answer\"], \"context\": row[\"Context\"]} \n",
        "    for _, row in df.iterrows()\n",
        "]\n",
        "qa_pairs[0]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {},
      "outputs": [],
      "source": [
        "\n",
        "dataset.insert(qa_pairs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "LLM application"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {},
      "outputs": [],
      "source": [
        "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
        "\n",
        "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n",
        "index = VectorStoreIndex.from_documents(documents)\n",
        "\n",
        "query_engine = index.as_query_engine()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Track it with Opik"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {},
      "outputs": [],
      "source": [
        "from opik import track\n",
        "\n",
        "@track\n",
        "def my_llm_application(input: str) -> str:\n",
        "    response = query_engine.query(input)\n",
        "    return str(response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Track the LLM calls"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {},
      "outputs": [],
      "source": [
        "import openai\n",
        "from opik.integrations.openai import track_openai\n",
        "\n",
        "# Define the task to evaluate\n",
        "openai_client = track_openai(openai.OpenAI())\n",
        "\n",
        "MODEL = \"gpt-3.5-turbo\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define the evaluation task"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def evaluation_task(x):\n",
        "    return {\n",
        "        \"output\": my_llm_application(x['input'])\n",
        "    }"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create a dataset client\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {},
      "outputs": [],
      "source": [
        "from opik import Opik\n",
        "\n",
        "client = Opik()\n",
        "dataset = client.get_or_create_dataset(name=\"Test dataset\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define evaluation metrics"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from opik.evaluation.metrics import (\n",
        "    Hallucination,\n",
        "    AnswerRelevance,\n",
        "    ContextPrecision,\n",
        "    ContextRecall\n",
        ")\n",
        "\n",
        "\n",
        "# Define the metrics\n",
        "hallucination_metric = Hallucination()\n",
        "answer_relevance_metric = AnswerRelevance()\n",
        "context_precision_metric = ContextPrecision()\n",
        "context_recall_metric = ContextRecall() "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run evaluation"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Evaluation:   0%|          | 0/5 [00:00<?, ?it/s]/Users/akshaypachaar/miniconda3/envs/env_rag/lib/python3.10/site-packages/pydantic/main.py:426: UserWarning: Pydantic serializer warnings:\n",
            "  Expected `PromptTokensDetails` but got `dict` with value `{'audio_tokens': 0, 'cached_tokens': 0}` - serialized value may not be as expected\n",
            "  return self.__pydantic_serializer__.to_python(\n",
            "/Users/akshaypachaar/miniconda3/envs/env_rag/lib/python3.10/site-packages/pydantic/main.py:426: UserWarning: Pydantic serializer warnings:\n",
            "  Expected `PromptTokensDetails` but got `dict` with value `{'audio_tokens': 0, 'cached_tokens': 1024}` - serialized value may not be as expected\n",
            "  return self.__pydantic_serializer__.to_python(\n",
            "Evaluation: 100%|██████████| 5/5 [00:20<00:00,  4.17s/it]\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─ Test dataset (5 samples) ─────────────╮\n",
              "│                                        │\n",
              "│ <span style=\"font-weight: bold\">Total time:       </span> 00:00:21            │\n",
              "│ <span style=\"font-weight: bold\">Number of samples:</span> 5                   │\n",
              "│                                        │\n",
              "│ <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">hallucination_metric: 0.2000 (avg)</span>     │\n",
              "│ <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">answer_relevance_metric: 0.8200 (avg)</span>  │\n",
              "│ <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">context_precision_metric: 0.7200 (avg)</span> │\n",
              "│ <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">context_recall_metric: 0.7600 (avg)</span>    │\n",
              "│                                        │\n",
              "╰────────────────────────────────────────╯\n",
              "</pre>\n"
            ],
            "text/plain": [
              "╭─ Test dataset (5 samples) ─────────────╮\n",
              "│                                        │\n",
              "│ \u001b[1mTotal time:       \u001b[0m 00:00:21            │\n",
              "│ \u001b[1mNumber of samples:\u001b[0m 5                   │\n",
              "│                                        │\n",
              "│ \u001b[1;32mhallucination_metric: 0.2000 (avg)\u001b[0m     │\n",
              "│ \u001b[1;32manswer_relevance_metric: 0.8200 (avg)\u001b[0m  │\n",
              "│ \u001b[1;32mcontext_precision_metric: 0.7200 (avg)\u001b[0m │\n",
              "│ \u001b[1;32mcontext_recall_metric: 0.7600 (avg)\u001b[0m    │\n",
              "│                                        │\n",
              "╰────────────────────────────────────────╯\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploading results to Opik <span style=\"color: #808000; text-decoration-color: #808000\">...</span> \n",
              "</pre>\n"
            ],
            "text/plain": [
              "Uploading results to Opik \u001b[33m...\u001b[0m \n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">View the results <a href=\"https://www.comet.com/opik/akshayp/experiments/019459ad-9126-7083-8ec1-f1470d099888/compare?experiments=%5B%22019459e1-9329-7e12-b62c-9dd782e93d2d%22%5D\" target=\"_blank\">in your Opik dashboard</a>.\n",
              "</pre>\n"
            ],
            "text/plain": [
              "View the results \u001b]8;id=961125;https://www.comet.com/opik/akshayp/experiments/019459ad-9126-7083-8ec1-f1470d099888/compare?experiments=%5B%22019459e1-9329-7e12-b62c-9dd782e93d2d%22%5D\u001b\\in your Opik dashboard\u001b]8;;\u001b\\.\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from opik.evaluation import evaluate\n",
        "\n",
        "evaluation = evaluate(\n",
        "    dataset=dataset,\n",
        "    task=evaluation_task,\n",
        "    scoring_metrics=[hallucination_metric, answer_relevance_metric, context_precision_metric, context_recall_metric],\n",
        "    experiment_config={\n",
        "        \"model\": MODEL\n",
        "    }\n",
        ")"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "py312_llm_eval",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
